52 research outputs found

    Introduction to Machine Learning and Bioinformatics

    Get PDF
    Abstracts not available for BookReview

    Parallel Computing for Biological Data

    Get PDF
    In the 1990s a number of technological innovations appeared that revolutionized biology, and 'Bioinformatics' became a new scientific discipline. Microarrays can measure the abundance of tens of thousands of mRNA species, data on the complete genomic sequences of many different organisms are available, and other technologies make it possible to study various processes at the molecular level. In Bioinformatics and Biostatistics, current research and computations are limited by the available computer hardware. However, this problem can be solved using high-performance computing resources. There are several reasons for the increased focus on high-performance computing: larger data sets, increased computational requirements stemming from more sophisticated methodologies, and latest developments in computer chip production. The open-source programming language 'R' was developed to provide a powerful and extensible environment for statistical and graphical techniques. There are many good reasons for preferring R to other software or programming languages for scientific computations (in statistics and biology). However, the development of the R language was not aimed at providing a software for parallel or high-performance computing. Nonetheless, during the last decade, a great deal of research has been conducted on using parallel computing techniques with R. This PhD thesis demonstrates the usefulness of the R language and parallel computing for biological research. It introduces parallel computing with R, and reviews and evaluates existing techniques and R packages for parallel computing on Computer Clusters, on Multi-Core Systems, and in Grid Computing. From a computer-scientific point of view the packages were examined as to their reusability in biological applications, and some upgrades were proposed. Furthermore, parallel applications for next-generation sequence data and preprocessing of microarray data were developed. Microarray data are characterized by high levels of noise and bias. As these perturbations have to be removed, preprocessing of raw data has been a research topic of high priority over the past few years. A new Bioconductor package called affyPara for parallelized preprocessing of high-density oligonucleotide microarray data was developed and published. The partition of data can be performed on arrays using a block cyclic partition, and, as a result, parallelization of algorithms becomes directly possible. Existing statistical algorithms and data structures had to be adjusted and reformulated for the use in parallel computing. Using the new parallel infrastructure, normalization methods can be enhanced and new methods became available. The partition of data and distribution to several nodes or processors solves the main memory problem and accelerates the methods by up to the factor fifteen for 300 arrays or more. The final part of the thesis contains a huge cancer study analysing more than 7000 microarrays from a publicly available database, and estimating gene interaction networks. For this purpose, a new R package for microarray data management was developed, and various challenges regarding the analysis of this amount of data are discussed. The comparison of gene networks for different pathways and different cancer entities in the new amount of data partly confirms already established forms of gene interaction

    affyPara—a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data

    Get PDF
    Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is necessary to estimate the rule’s prediction quality honestly

    State-of-the-Art in Parallel Computing with R

    Get PDF
    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix

    State of the Art in Parallel Computing with R

    Get PDF
    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.

    Determination of cell survival after irradiation via clonogenic assay versus multiple MTT Assay - A comparative study

    Get PDF
    For studying proliferation and determination of survival of cancer cells after irradiation, the multiple MTT assay, based on the reduction of a yellow water soluble tetrazolium salt to a purple water insoluble formazan dye by living cells was modified from a single-point towards a proliferation assay. This assay can be performed with a large number of samples in short time using multi-well-plates, assays can be performed semi-automatically with a microplate reader. Survival, the calculated parameter in this assay, is determined mathematically. Exponential growth in both control and irradiated groups was proven as the underlying basis of the applicability of the multiple MTT assay. The equivalence to a clonogenic survival assay with its disadvantages such as time consumption was proven in two setups including plating of cells before and after irradiation. Three cell lines (A 549, LN 229 and F 98) were included in the experiment to study its principal and general applicability

    Prospective, open, multi-centre phase I/II trial to assess safety and efficacy of neoadjuvant radiochemotherapy with docetaxel and oxaliplatin in patients with adenocarcinoma of the oesophagogastric junction

    Get PDF
    Background: This phase I/II-trial assessed the dose-limiting toxicities (DLT) and maximum tolerated dose (MTD) of neoadjuvant radiochemotherapy (RCT) with docetaxel and oxaliplatin in patients with locally advanced adenocarcinoma of the oesophagogastric junction. Methods: Patients received neoadjuvant radiotherapy (50.4 Gy) together with weekly docetaxel (20 mg/m2 at dose level (DL) 1 and 2, 25 mg/m2 at DL 3) and oxaliplatin (40 mg/m2 at DL 1, 50 mg/m2 at DL 2 and 3) over 5 weeks. The primary endpoint was the DLT and the MTD of the RCT regimen. Secondary endpoints included overall response rate (ORR) and progression-free survival (PFS). Results: A total of 24 patients were included. Four patients were treated at DL 1, 13 patients at DL 2 and 7 patients at DL 3. The MTD of the RCT was considered DL 2 with docetaxel 20 mg/m2 and oxaliplatin 50 mg/m2. Objective response (CR/PR) was observed in 32% (7/22) of patients. Eighteen patients (75%) underwent surgery after RCT. The median PFS for all patients (n = 24) was 6.5 months. The median overall survival for all patients (n = 24) was 16.3 months. Patients treated at DL 2 had a median overall survival of 29.5 months. Conclusion: Neoadjuvant RCT with docetaxel 20 mg/m2 and oxaliplatin 50 mg/m2 was effective and showed a good toxicity profile. Future studies should consider the addition of targeted therapies to current neoadjuvant therapy regimens to further improve the outcome of patients with advanced cancer of the oesophagogastric junction. Trial Registration: NCT0037498

    Conceptual Aspects of Large Meta-Analyses with Publicly Available Microarray Data: A Case Study in Oncology

    Get PDF
    Large public repositories of microarray experiments offer an abundance of biological data. It is of interest to use and to combine the available material to create new biological information and to develop a broader view on biological phenomena
    corecore